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Abstract — Marker-based motion capture (MoCap) systems 
can be composed by several dozens of cameras with the purpose 
of reconstructing the trajectories of hundreds of targets. With 
a large amount of cameras it becomes interesting to determine 
the optimal reconstruction strategy. For such aim it is of fun- 
damental importance to understand the information provided 
by different camera measurements and how they are combined, 
i.e. how the reconstruction error changes by considering dif- 
ferent cameras. In this work, first, an approximation of the 
reconstruction error variance is derived. The results obtained 
in some simulations suggest that the proposed strategy allows 
to obtain a good approximation of the real error variance with 
significant reduction of the computational time. 

I. Introduction 

Nowadays marker-based motion capture (MoCap) systems 
can be composed by several dozens of cameras with the 
purpose of reconstructing the trajectories of hundreds of 
targets. However, as costs of modern microprocessor and 
camera hardware decrease, it becomes economically viable 
to consider MoCap systems made of large camera networks 
of several hundreds of cameras, meeting the growing request 
for higher precision reconstruction of larger scenarios. This 
requirement, in terms of both minimizing the single target 
estimation error and of increasing the quality in the scene 
description, translates into scaling up with both the number 
of markers and the number of cameras. 

The MoCap task can typically be divided into two steps: 
Reconstructing the 3D target positions by means of the mea- 
surements at time t, and merging such reconstructions with 
the dynamic evolutions of previously detected targets (data 
association and tracking). This paper focuses on the first step. 
If the system is composed by a limited number of cameras 
and targets, the classical reconstruction algorithm based on 
geometric triangulation [7], [8], [16] can be implemented 
in a centralized fashion on a single machine to track the 
targets in real time. On the other hand, when considering 
the envisaged large system scenarios, it becomes difficult to 
simultaneously take into account the data provided by all the 
cameras. So, first, only portions of the system are considered 
simultaneously, and then the 3D reconstruction is achieved 
by progressively merging data from different parts of the 
system. In this framework it is important how the information 
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is elaborated and merged by different cameras, i.e. some pairs 
of cameras will allow a better reconstructioru with respect 
to others. 

This work deals with the problem of determining the 
information provided by different cameras about a target, 
and, consequently, what are the cameras that allow the 
optimal reconstruction of the investigated targets. 

Even if the problem here is formulated in the MoCap 
framework, actually it is closely related also to other areas 
in computer vision: In the structure from motion framework 
[13], a quality measure among tensors is derived in [14] 
for reconstruction based on a hierarchical computational 
structure of trifocal tensors. Furthermore, in the multi-view 
stereo context, [6] and [5] used suitable "affinity" functions 
to properly select a set of "optimal" views. 

II. Reconstruction error statistical description 

Because of noise and discrete measurements the ray asso- 
ciated by a camera to a target's position actually represents 
only an estimation of the "mean" direction along which the 
target is positioned. The uncertainty on the 3D position pro- 
vided by such mean direction grows as the distance between 
the viewing camera and the target increases (and it is parallel 
to the sensor plane, which practically is approximatively 
orthogonal to the ray direction). 

The reconstruction error using two cameras changes de- 
pending on cameras' positions and orientations: Cameras 
at orthogonal orientations typically provides reconstructions 
with the smallest estimation errors, and conversely for cam- 
eras viewing along the same direction. However, because 
of the different views and of occlusions (both due to other 
objects in the scene and to the targets' object themselves), 
cameras at very different positions and orientations usually 
retrieve measurements of only few common targets. 

In this section, the uncertainties on single and multi- 
camera reconstructions are presented in detail. 

A. Single camera measurements 

Consider a target i placed at 0^ — {xi,yi,Zi)'^ in the 
3D space. The noise of the target on the j-th camera 
measurement ^^j = {uij,Vij)^ is assumed to be additive 
and Gaussian [7], [8]: 



Kij — si? ' ^i 



(1) 



where ^^ = [uij^Vij]^ is the measurement without noise 
and By ^ A/'(0, Eg; ) is the measurement noise. Hereafter 

'The concept of reconstruction quality considered here wants to take 
into account of several factors, among them: The number of reconstructed 
targets, the reconstruction accuracy, and the required computational time. 



the noise variance matrix 'Eg. . is modeled as diag(crg. . , ffg. . ), 
where a^ is the standard deviation. Note that ere; typically 
depends on camera and target reciprocal positions (and on 
camera orientation). In addition, the value of a^^- depends 
also on the image analysis algorithm used for detecting it. 
Since the complete coverage of this topic is out of the scope 
of this work, hereafter the value of cr„ is taken as known. 
Each measurement from camera j is a point on its sensor 
that corresponds to a ray passing through such point and 
camera's optical center, as shown in Fig. [1] 
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Fig. 1. Red line: ray associated to measurement {uij,Vij) of target i in 
camera j. 

If different information are not available (e.g. the size of 
the target), target's 3D position cannot be reconstructed using 
a single measurement. However, it is possible to reconstruct 
by means of geometric triangulation using at least two 
measurements, as shown in Fig. |2] 




Fig. 2. Triangulation between two cameras. Crossing point between rays 
determined by different cameras allows to obtain target's 3D position. 

Let the plane Vij be parallel to the image plane Ij of 
camera j and passing through the target i. Because of the 
measurement noise Cij the ray associated to target i by 
camera j will intersect with plane Vij on a point (pi ^ (l)i. 



Let e'ij = 4>i- c 
Bij according to: 



is obtained by propagation of error 



/e^, = /'e, 



(2) 



where / is the camera focal length and /' is the distance 
from Ij to Vij. Actually, the above equation holds for all 
planes Vj (on the front side of camera j) parallel to Ij (see 
Fig. El). 

While the measurement error propagates on the Vj plane 
as described, the measurement does not provide any infor- 
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Fig. 3. Propagation of camera measurement error 



mation about the target position along the line starting from 
the optical center Oj and passing through t,ij- 

Exploiting different measurements of the same target 
allows to obtain a good estimation of the real distance /', 
therefore by combining several camera measurements the 
information provided by camera j about target i can be 
modeled as: 



4>ii -^ A/" ( 



where 



E,, = M^.j^J^ + g\ 



I— \ xT/ Vl/J 



(3) 



(4) 



and M is a number much larger than the maximum room's 
side size multiplied by m, ipij is the unit vector along the di- 
rection from Oj to 4>i, and ^^ is an orthonormal basis of the 
plane Pij parallel to Ij and passing through the origin. Since 
M is very large, the first term in (3) expresses the practical 
absence of information provided by camera j about target i 
along the ipij direction, i.e. along the direction of the line 
from Oj to the point. Instead, the second term corresponds to 
the variance of the measurement error propagated using (2) 
to the plane Pij . We stress the fact that the approximation of 
the reconstruction error variance (@) is good in nonsingular 
conditions, i.e. when the target position can be adequately 
reconstructed (which is a typical operating condition when 
using a large number of cameras). An experimental proof 
of the goodness of the approximation obtained by (|4]i in the 
framework of multiple-cameras reconstruction is given by 
the simulations in the Subsec. III-BI 

B. Multiple camera reconstruction 

The approximation of equation ^ is particularly useful 
when combining measurements from different cameras. 

Without loss of generality, consider the reconstruction of 
the position of target i from the measurements of cam- 
eras j = 1,2, ... ,mi. When at least two non aligned 
measurements are available, the position of the target can 
be estimated by means of geometric triangulation. Then, 
/' in (|2|i is approximatively known, and (l3]l is a good 



approximation of the information provided by each camera j 
among those available for the reconstruction of target i. Thus, 
from (|3]l the uncertainty on the reconstructed position can 
be approximated as follows (minimum variance estimation, 
[10]): 



S, = 






(5) 



and the overall standard deviation of the reconstruction error 
can be estimated as A/trace(Ei). 

For comparison, a direct evaluation of the reconstruction 
error variance can be obtained as the sample reconstruction 
variance in a Monte Carlo (MC) simulation: 



N 



1 ^ 
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(6) 
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where TV is a large integer number, and 6^^ is the recon- 
struction error (difference between true and reconstructed 
position) in the MC iteration k. Fig. |4] shows the percent 
error between the sample reconstruction standard deviation 
(|6]l and that computed from approximation ^ varying the 
number of cameras m considered for reconstruction from 2 
to 256 (cameras are equally spaced along a circle of 10 m 
radius). The reported values are the mean of the results 
obtained on 1000 randomly sampled points (all positioned in 
the volume delimited by the cameras) for each choice of m. 
At each iteration, the m cameras used for the reconstruction 
are randomly selected among the 256 available. 
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Fig. 4. Percent difference between tlie standard deviation computed from 
samples (6j and tlie approximated theoretical one J5j. 

In Fig. |5l it is highlighted how the reconstruction error 
(using two cameras) depends on the angle between the 
cameras: The closer the angle is to 7r/2 the better the 
estimated position results. In this example, 16 cameras are 
positioned (equally spaced) on a circle of 10 m radius. The 
reconstruction error is computed for the point in the center 
of the cameras' circle. As shown in Fig.|5] the Id-level curve 
computed from (|5]l is practically overlapped to the 1 cr-level 
curve estimated by sample data. 

The performance evaluation of a MoCap system typically 
requires to compute the reconstruction error on a (quite 
large) representative number of points (voxels). Since the 
MC variance estimation can be quite time demanding, it is 
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Fig. 5. Compaiison of 1 cr-level curve of the reconstruction error: Variance 
obtained by MC simulation (blue) and theoretical approximation (red). The 
error is evaluated for different angles between the two cameras: tt/S (which 
is represented by the extemal ellipse), 7r/4, Stt/S n/2 (small circle inside 
the other curves). 



worth to consider (|5) that allows to compute in closed form 
good approximations (as in Figs. |4]|5]l, at a computational 
cost largely lower than using the MC method. 

III. Conclusions 

In this work, an approximation of the reconstruction error 
variance has been derived for marker-based motion capture 
system. Such approximation can be useful in deriving the 
optimal strategy for pairing cameras to reduce the recon- 
struction computational time in a distributed approach. 
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